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Abstract 

The presented work investigates a sparse Bayesian incremental automatic relevance determination 
(IARD) algorithm in the context of multipath parameter estimation in a super-resolution regime. The 
corresponding estimation problem is highly nonlinear and, in general, requires an estimation of the 
number of multipath components. In the IARD approach individual multipath components are processed 
sequentially, which permits a tractable convergence analysis of the corresponding inference expressions. 
This leads to a simple condition, termed here a pruning condition, that determines if a multipath 
component is “sparsified” or retained in the model, thus realizing a sparse estimator and permitting 
a fast and adaptive realization of the estimation algorithm. Yet previous experiments demonstrated that 
IARD fails to select the correct number of components when the parameters entering nonlinearly the 
multipath model are also estimated. To understand this effect, an analysis of the statistical structure 
of the pruning condition from the perspective of statistical hypothesis testing is proposed. It is shown 
that the corresponding test statistic in the pruning condition follows an extreme value distribution. As a 
result, when applied to the problem of multipath estimation, the standard IARD algorithm implements a 
statistical test with a very high probability of false alarm. This leads to insertion of estimation artifacts 
and underestimation of signal sparsity. Moreover, the probability of false alarm worsens as the number 
of measured signal samples grows. Based on the developed statistical interpretation of the IARD, an 
optimal adjustment of the pruning condition is proposed. This permits a reliable and efficient removal of 
estimation artifacts and joint estimation of signal parameters, as well as optimal model order selection 
within a sparse Bayesian learning framework. The presented experiments demonstrate the effectiveness 
of this approach. 
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Index Terms 

Super-resolution channel estimation, model order selection, sparse Bayesian learning. 

I. Introduction 

Multipath propagation is known to have a significant impact on the performance of wireless commu¬ 
nication or localization systems. However, when the multipath channel structure is known, it can offer a 
key to a reliable high-rate data communication or accurate localisation. 

Typically, a multipath wireless channel is assumed to consist of a linear combination of a finite number 
of L discrete propagation paths, which we term multipath components, embedded in a white additive 
ambient noise and a non-white random process that represents diffuse propagation. While multipath 
components can be deterministically described by a set of parameters - dispersion parameters that 
characterize specular waves propagating from the transmitter site to the receiver site, such as a propagation 
delay, direction of departure, direction of arrival, and a Doppler frequency - diffuse components are of 
a random nature and are characterized statistically fT|, |[2], 0. In this work we are concerned with an 
estimation of the discrete multipath components as they are a very sought-after characteristic of a wireless 
propagation channel due to their direct relationship to the geometry of the propagation environment. 

Historically, the problem of multipath component parameter estimation has been solved using a com¬ 
bination of two techniques: super-resolution (SR) parameter estimation algorithms (see e.g., 0, £Q, 10 
and references therein) and model order selection 0, J7J, 0. Parameter estimation algorithms are used 
to find the parameters of multipath components given measurement data and a model of a multipath 
channel with a known number of superimposed components. SR property of the estimation algorithm 
is essential, as an accurate estimation of component parameters beyond bandwidth resolution is often 
required. Expectation-Maximization (EM) type of algorithms (21, 0, J9|, 0 are often used for this 
purpose. They allow simplifying the numerical optimization of the objective function with respect to 
the dispersion parameters that enter the channel model nonlinearly. Unfortunately, these techniques are 
applicable only when the order of the model, i.e., the number of specular components is known - a 
requirement that is rarely satisfied in practice. This has motivated the use of model order selection 
techniques, such as Bayesian Information criterion or Minimum description length and similar ITOl . 171 . 
0, 0 to determine the number of components in the model. These methods select the model order 
by balancing the model complexity, i.e., a total number of parameters to be estimated, with a norm 
of the residual error. Yet for the considered problem these algorithms become computationally very 
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demanding: in order to find the optimal model order, the parameters of models with different number 
of components have to be estimated first, and then compared using selected criterion. In practice, the 
number of components can range from a only a few to several tens of components, making separate 
parameter estimation and model order selection very inefficient, especially in time-varying scenarios, 
where the number of components can change fTTil . l lT2l . 

To make estimation more efficient, we propose a variational Bayesian wireless channel estimator that 
combines model order selection and parameter estimation within a single framework. The proposed 
solution is based on merging a variational Bayesian parameter estimation 021 . 0~4il . which generalizes 
classical EM-based SR parameter estimation algorithms, and sparse Bayesian learning (SBL) techniques 
031, Ifl6l . fm . Sparse reconstruction of a multipath channel can effectively solve the model order 
selection problem, since irrelevant multipath components will be “sparsified” by the algorithm; sparsity, 
thus, effectively controls the complexity of the estimated models. 

Such multipath estimation approaches have been to some extent explored in ifTSl and ff9l . In iflSl 
the authors casted the Space Alternating Generalized Expectation-Maximization (SAGE) algorithm for 
multipath parameter estimation^ in a variational Bayesian framework. The new algorithm, termed varia¬ 
tional Bayesian SAGE (VB-SAGE), introduces sparsity priors to jointly estimate model order via sparsity 
penalization and estimate the parameters of multipath components. The VB-SAGE algorithm makes a 
typical assumption on the independence of individual components. In llT9l this assumption is relaxed 
by considering correlations between the gains of propagation paths. By adopting a special class of SBL 
algorithms, known as incremental Automatic Relevance Determination (IARD) BT1 . Il22l . Il23l , E4l . a 
new algorithm is proposed that, as we will show here, generalizes the VB-SAGE algorithm. A key feature 
of both VB-SAGE and IARD algorithms is the structure of variational inference expressions that leads to 
a simple numerical condition for removing or keeping a component in the model. It is this condition that 
eventually leads to sparse estimate. Further in the text we refer to this condition as a pruning condition. 
The pruning condition permits the reduction of the model complexity “on the fly”, while the components 
are updated. In this way model order selection and parameter estimation are realized jointly. 

It has been observed, however, that some of the estimated multipath components have small, yet non¬ 
zero weights 021 • In other words, the IARD and VB-SAGE estimators compress the measured signal, 
but overestimate the model order. To cancel erroneous components an empirical threshold was adopted 
in ll24l . 021 , l l22l . The selection of the threshold exploits the link between the pruning condition and an 

‘See (20) and gj for the details on the SAGE algorithm. 
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estimate of the per-component signal-to-noise ratio (SNR). Yet it remains unclear whether a particular 
choice of the threshold can be motivated more formally. A better understanding of these aspects can 
be exploited not only for improving performance of IARD schemes in the presence of noise and better 
understanding of the IARD performance in general, but for an accurate and fast extraction of specular 
multipath components, as we argue in this paper. 

Thus, our goals in this work can be formulated as follows: we aim to further the theoretical under¬ 
standing of IARD within the context of sparse estimation of multipath component and present a more 
detailed analysis of the pruning condition used in the IARD algorithms. Specifically, we show that the 
IARD algorithm generalizes VB-SAGE. Also, we demonstrate that the pruning condition used in IARD is 
equivalent to a statistical hypothesis test applied to a specific multipath component under the assumption 
that the other multipath components are fixed. With this new interpretation it becomes possible to show 
that (i) within the IARD scheme the presence of a component in the model can be determined using a 
statistical hypothesis test of a desired test size, (ii) the test is a uniformly most powerful (UMP), (iii) 
probability of false alarm for this test (i.e., the probability of falsely accepting a component in the model) 
is upper-bounded, with the standard IARD algorithm implementing the test with the highest probability 
of false alarm. 

Throughout this paper we shall make use of the following notation. Vectors arc represented as boldface 
lowercase letters, e.g., x, and matrices as boldface uppercase letters, e.g., X. For vectors and matrices 
(■) H denotes the Her mi t!an transpose. We write [X]p.j to denote an element of the matrix X at the 
kth row and Zth column. The expression diag(tc) stands for a diagonal matrix with the elements of x 
on the main diagonal. For some positive-semidefinite matrix A, notation 11 ® 11 ^ = Vx H Ax denotes 
a weighted (2 norm of a vector x. We write E ? ( x ) f(x) to denote the expectation of the function fix) 
under the probability density function q(x). Finally, for a random vector x, CN(a?|a, B) denotes a circular 
complex multivariate Gaussian pdf with mean a and covariance matrix B: similarly, for a random variable 
x , Ga(x|a, 6 ) = ]^j% a 1 exp(— 6 . 1 ;) denotes a gamma pdf with parameters a and b. 

II. Signal model 

In the following sections we outline the used signal model. Also, the corresponding probabilistic 
formulation of the inference problem that builds the foundation for the variational Bayesian parameter 
estimation adopted here is presented. 
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A. Multipath channel model 

Consider for simplicity a single-input-single-output (SISO) wireless channel The received signal 
y(t) can be represented as a superposition of an unknown number L of specular multipath components 
wis(t;Oi ) contaminated by additive noise £(f) (see e.g., 0 , lfT 8 ll . If25l ): 

L 

y(t) = wis(t-, Oi) + £(t). (l) 

i=i 

In (OQ) wi is a complex-valued multipath gain and s(t: 0/) is an altered version of some transmitted signal 
x(t). The alteration process is described by a (non-linear) mapping x[t) i -7 s(t: Of), where 0; is the 
vector of dispersion parameters, e.g., relative delay, Doppler shift, etc. For a SISO channel, s(t; 6 i ) can 
be represented as s(t;9i) = s{t\Tpu{) = e j 27 ri/it a:(f — 77 ), where 61 = [ti,vi] t , 17 is a delay of the Ith 
multipath component and z// is its Doppler shift. In general, the nonlinear mapping x(t) 1-7 s(t,0i) also 
includes the measurement system effects, e.g., signal distortions at the transmitter and the receiver due 
to analog filtering, RF components, etc. Additive noise £(t) is assumed to be a zero-mean wide-sense 
stationary Gaussian process. In addition to white noise, this term will also include effects due to diffuse 
scattering 0 , 0 . 

In practice the signal y{t) is sampled with the sampling period T s , resulting in N discrete measurement 
samples. By stacking the samples in a vector y = [y(0),... ,y((N — 1)T s )] t , model CO) can be rewritten 
in a more convenient matrix form as 

L 

y = ^Z w i s ( e i) + £ = s(&)w + ^, ( 2 ) 

i=i 

where we define s(0{) = [s(O;0j),.. .,s((N - 1 )T S ; 9{)] T , 0 = [0i,... ,0 L ], w = [mi,... ,w L ] T , and 
5(0) = [s(0i),..., s(0i)]. The term ^ = [^(0),..., £((iV — 1)T s )] t is the additive noise vector that 
follows a circular complex normal distribution with covariance matrix E = A -1 . In the following 

we will assume that A is known or has been estimated; the estimation of diffuse scattering statistics and 
white noise statistics we will leave outside the scope of this work. 

B. Probabilistic structure of the multipath channel model 

Expression 0 is the starting point for the multipath parameter estimation algorithms. Given 0, 
the joint model order selection and parameter estimation aims at determining the values of L, w, 

2 The proposed method can also be extended to MIMO time-variant channels with stationary propagation constellation. This 
will, however, lead to a more complicated signal model with additional dispersion parameters, while not adding any new aspect 
relevant to the understanding of the proposed methods. 
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and 0. For fixed L both w and 0 can be found using classical maximum a posteriori (or maximum 
likelihood) approach, which amounts to a numerical maximization of the corresponding probability density 
function (pdf) p(w,&\y) oc p(y\w,®)p(w,@), where p{y\w.@) = CN(y\S(®)w, A -1 ) following 
©. Unfortunately, in majority of practical cases the number of multipath components L is not known. 
A possible approach to circumvent an explicit specification of the model order consists of imposing 
sparsity constraints on w. The advantage of such approach is a joint model order selection and parameter 
estimation within a Bayesian inference framework, as will be outlined below. 

A classical SBL approach lU5l . IT6l . flT71 assumes a hierarchical factorable prior p(w\a.)p(a) = 
n£=i p{wi\ai)p(ai) for the weights w, where p(wi\ai) = CN(^|0, oij 1 ). Parameters cq, also called 
sparsity parameters, regulate the width of this pdf and must be estimated along with the other model 
parameters - an approach referred to as empirical Bayes. 

In IARD version of SBL two techniques are combined. First, the hyperprior p(a) is assumed to 
be non-informative by selecting p(a) oc nf=i a i l - Such choice is known as automatic relevance 
determination (ARD). The resulting inference scheme is then similar to a weighted version of minimum 
/:)-norm regression and basis pursuit denoising (see lf26l . | [27l . 1 1281 ) - more traditional “non-Bayesian” 
methods for learning sparse representations. Second, in the incremental inference approach to the SBL the 
corresponding objective function is optimized with respect to the parameters of one component per single 
algorithm iteration. Such incremental optimization permits a fast estimation of sparsity parameters li2Tl . 
ll22l . |[23l . Moreover, it also underlies the EM-based multipath estimation schemes, since it simplifies 
nonlinear optimizations with respect to dispersion parameters 0. This motivates a combination of IARD 
and multipath inference schemes in a single framework. 

The joint multipath parameter estimation and model order selection within IARD amounts to inference 
of the joint posterior pdf 


p(w, 0, a\y) oc p(y\w, ®)p(w\a)p(a)p(@), 


(3) 


where we explicitly assume that p{w, 0, a) = p(w\a)p(a)p(®). Unfortunately, © cannot be evaluated 
in closed form, but can be approximated using, e.g., variational Bayesian techniques fl4l . l lT3l . The latter 
aims at estimating an approximating pdf q(w, 0, a) by maximizing the lower bound of the log-evidence 

log p{y) ■ 


log p{y) > 


p(w, 0, q, y) 

E log— -—-—, 

g(w,@,ct ) q(w, 0, cc) 


(4) 


which is equivalent to minimizing the Kullback-Leibler divergence between q(w, 0, a) and the intractable 
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p(w, 0, a\y). The complexity of the inference depends on the choice of q(w, 0, a). Here we will assume 
that 

L 

q(w,@,cx) = q(w)Y[q{O k )q(a k ). (5) 

k =1 

Let us now specify each factor in ©. First, we will select q(0i) = 5(0i — 9{). This assumption results 
in a point estimate of the dispersion parameters. This choice simplifies the numerical optimization of the 
right-hand side of ©. For the factor q(w) we will consider two assumptions: 

L L 

(Al) : q(w) = $;), (6) 

1=1 i=i 

(A2) : q(w) = CN(to|w), <&). (7) 

Al explicitly enforces a statistical independence between individual multipath components; this assump¬ 
tion underlies the SAGE (5} and the VB-SAGE algorithms lfT8l for multipath parameter estimation. Under 
the assumption A2 the gains of the components are assumed to be correlated. This formulation is used 
in a classical SBL and in the IARD algorithm for multipath estimation in llT9!l . In the following we will 
consider both assumptions and investigate their impact on multipath estimation and detection. Let us 
mention here that Al can be obtained as a special case of A2 by constraining $ to a diagonal matrix. 
The form of the factor q(a) can be obtained analytically as a maximizer of © for the chosen form of 
q(0i) and q(w). For the IARD case it can be shown fl8l . l22ll that 

L L 

q( a ) = n ?(«/) = n Ga (ai; 1, af 1 ), 

i=i i=i 

i.e., q(ai) is parameterized by a single coefficient oq. 

The maximization of the bound in © then reduces to the estimation of the parameters w, <F. 2;, and 
9i, l = 1, ..., L that parameterize ©. In what follows we describe this in more details. 

C. Incremental variational inference of model parameters 

The IARD algorithm optimizes © with respect to the parameters of one component per iteration, 
cycling through the components in a round-robin fashion. Consider now the variational inference steps 
for a single component l. We will begin with the estimation of q(9i). To this end we define 0 / = 
[0i,..., 02 -i, 02 +i,..., 9l\ as a set of dispersion parameters obtained by removing 0; from 0, and 


March 9, 2015 


DRAFT 


assume that the pdfs q(w), q(a), and are available^ The bound in © on log p(y) with respect 

to q( 6 i) can then be expressed as log p(y) > E^g,) log (p{ 01 )/q{ 61 )), where 

( 8 ) 


p(Qi) cx exp I E log p(y\w,@)p(@) . 

,) ) 

This bound is maximized when the Kullback-Leibler divergence between q{0{) and p( 6 i) is minimal. Due 
to the assumed form of q(Oi), this is achieved when 61 is aligned with the mode of p{ 6 i). By computing 
the expectation in © it can be shown that 


61 = argmax \ log p( 6 i) - ||r z - wis( 6 i )|| 


E 25ft{[%,, S (0 fc )*A S (0O} - [*]i,t||s(0i)lli}, 

k=l,k^l 


(9) 


where 5ft {•} denotes the real part operator and 


ri = y- E 

k=l,kj=l 


( 10 ) 


is a residual signal that cancels the contribution of the other L — 1 components. Solving © requires in 
general a numerical optimization. Let us point out that the last two terms in © account for correlations 
between the elements of w, acting as penalty factors in the estimator of 6/. Also, note that under the 
assumption A1 © coincides with the estimation expression used in the VB-SAGE algorithm lfl8l . 

Now, let us consider the estimation of q{a{). The bound in © with respect to q(ai) can be expressed 
as log p(y) > K qM logp(ai)/q(ai), where 

p(ai) ocexpf E log p(wi\ai)p(ai) 

\q(w) 

It has been demonstrated in lfl8l (for the assumption Al) and in lf22ll (for the assumption A2) that the 
sequence of estimates {qM(cq), q^(ai), ql 2 l (an ),...}, obtained by repeated maximization of the right- 
hand side of © with respect to the pdfs q{w{) and q(cq) (for Al), or q(w) and q{a{) (for A2), converges 
to the pdf q[°°](cq) = Ga(cq|l, (aj 00 ^ 1 ) with 


qoo] _ / (N 2 -$) \ > 1 


oo, 


M 2 


< 1. 


( 11 ) 


’In other words, we assume that the parameters of the corresponding pdfs are known. 
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The parameters q and /// in (fTTb are computed as follows. For the assumption Al: 


(Al) : q = l/\\s(di)\\ 2 A , m = qs(di) H Ar\ A1 \ 

L 

r [ ^ 1] =y- ^2 yj k s(e k ). 

k=l,k^l 

For the assumption A2, we first define 

A~i = diag([Si,..., S/ + i,..., Sl]), 

= [ 8 ( 0 1 ), . . . , r), s(0 Z+ l), • • • , 8 ( 0 L )], 

*-i = (s(e_ z ) H AS(0_ z ) + A^y 1 , 

m_/ = &_iS(®-i) H Ay, and 
r[ A2] = y - S(G_i)w_i. 

Then, q and /// for this assumption are evaluated as follows 


(12) 


(13) 


(A2) : q = (s(0*) H A8(0 l )~ 

s(0 l ) H AS(e- l )*-iS(e- l ) H Aa(0 l )y\ 

m =qs(0i) H Ay- 

qs(0 l ) H AS(e- l )*- l S(Q- l ) H Ay 
=qs(0i) H A(y - S(®-i)w-i ) 
=Q S (0^Arf 2] . 


(14) 


Let us point out that for both Al and A2 cases, the weight /q is a projection of s(0/) on the corresponding 
residual signal r\ or r\ , respectively. The latter are computed by canceling (subtracting) the contribu¬ 
tion of the other L— 1 components. Note that r[ A1 ^ coincides with (1 1 Oh ; also, r[ A2 ^ and rj A1 ^ are equal when 
%l is diagonal, i.e., for uncorrelated components. This will be a valid assumption for components that 
are physically well separated, i.e., when s(di) H As(Q k ) ~ 0, k / l. Thus, for uncorrelated components 
the IARD and the VB-SAGE algorithms will lead to the same estimation results. Also, when assumption 
Aoo is used with IARD, an instance of the VB-SAGE algorithm is obtained. Yet IARD does not require 
an introduction of any latent variables, as it was done in the VB-SAGE algorithm. 


March 9, 2015 


DRAFT 


10 


Finally, we estimate q(wi) and q(w). For the assumption A1 the parameters of q(wi) are computed as 


-i 


$t= llaWlli + Sp ,wi = Art 




(15) 


Similarly, for the assumption A2 we compute 

£ = (S(0) H AS(G) + diag(S)) _1 , 
w =$S{G) H Ay, 


(16) 


where a = [Si,..., a/_i, a|°° ] , a i+ 1 , a L ] T . 

The key advantages of such incremental component-wise estimation scheme are the expressions © 
and (fTB . The former permits a simpler numerical optimization of the dispersion parameters as the 
dimensionality of the resulting objective function equals to the dimensionality of 0i, rather than that of 
0. Result <Q7} gives a simple criterion for model order selection: when |/q| 2 < q, we get aj 00 ^ = oo, i.e., 
W[ —^ 0 and the component is removed. This implements an automatic model order selection. Moreover, 
the signal model can be constructed from bottom up, i.e., starting with an empty model S(&)w = 0, 
and initializing the first component using “incoherent” initialization as described in the Algorithm ([!]). 


Algorithm 1 Component initialization 


Compute ri <— y — S(@)w', estimate 6i using 


Of = argmaxj logp(6> z ) - ||rf s(0*)IIa)> 

0i L J 


(17) 


Compute q(on) using (fTIT) 
if aj°° ] is finite then 

Compute q(wi) using (fl5l) (or q(w) using 
else 

Discard the component and abort initialization 

end if 


If during the initialization the test dTTb results in a finite sparsity parameter aj 00 , a new component is 
accepted in the model. The parameters of the components are then updated following the Algorithm [2] 
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Algorithm 2 Parameter update 
while Not converged do 

for / € {1,..., L} do 

Update q{Oi) from © and q(ai ) using (fill) 
if a : |°°^ is finite then 

Update q(wi) using (fl5l) (or q(w) using ( fT6b ) 
else 

Remove the /th component from the model 

end if 
end for 
end while 


After update, the initialization can be repeated again for an updated residual signal. The algorithm is 
interrupted when no new components can be added to the model. Let us also mention at this stage that 
can be efficiently computed using rank-one updates (see lfT8l for more details). Thus, q(w) can be 
efficiently updated even for large L. 

The condition ///1 2 > q in (fTTT) we term a pruning condition since it determines if aE°° is finite. It 
forms a basis for a multipath component detector. In fact, the sparsity of the estimated model is governed 
by this condition. To better understand its properties and limitations we consider this condition in more 
details in the following section. 


III. Analysis of the pruning condition 

Let us now investigate this pruning condition in greater detail for both A1 and A2 assumptions. To 
this end we define pi = /q | 2 /q. A closer look at (fl2l) and (1 1 41 ) reveals that the parameters /// and q 
correspond, respectively, to the posterior estimate of the /th path weight wi and its variance when iq = 0. 
Thus, we can interpret pi as an estimate of the /th component SNR after the processingo Specifically, 
the pruning condition 

Pi > 1 , ( 18 ) 


states that an estimate of the approximating pdf q(ai) has a finite mean if, and only if, an estimate of 
the /th component SNR after subtracting the interference of the other L — 1 components exceeds 1 (or 

4 This can also be interpreted as the component SNR after a matched filter processing, with s(6i) playing the role of a matched 
filter. 
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equivalently 0 dB). 

Yet in many practical applications a 0 dB threshold might not represent the desired level of confidence 
in the estimated component. Moreover, we have empirically observed the condition (IT8l) generally over¬ 
estimates the model order: some of the detected components were falsely introduced into the model, with 
the estimated weights having small, yet non-zero weights and the corresponding parameters pi exceeding 
a 0 dB threshold. Empirical adjustment of the threshold to some level ki > 1 improves the model order 
estimate |24| . fl8l . ll22l . I l23ll . In what follows we explain why signal sparsity is overestimated with the 
condition (fT8l ) and how to select the threshold k; such that the conditions pi > rq is more robust against 
estimation artifacts. For this purpose we will explore a connection between the statistical structure of 
(fl8l) and hypothesis testing. 

Consider a single component /, and assume that the parameters of the other components are fixed. 
Define now two hypotheses Hq and H\ for the “true” weight wi of the Zth multipath component as 
follows: 


Hq: wi = 0 
H x : wi / 0. 


(19) 


Our goal here is to understand how statistics of pi can be utilized to choose between these two hypotheses 
in the Neyman-Pearson sense. To this end we will consider the distribution of pi under Hq and H \ 
hypotheses for both A1 and A2 assumptions. 


A. Assumption Al: independent multipath components 

We will begin our analysis with the following proposition: 


Proposition 1. Assume that 0/ is found from © and that other factors in (]6]) are fixed. Then, under 
hypothesis Hq the statistic pi will follow an extreme value distribution / I29l/ with the following pdf: 


p{pi\H 0 ) 


' (e- N H)6(pi) 

(1 — e ~ N / e )p( Pl \Ho) 


where 6 (pj) is a Dirac delta distribution and 


0 < pi < 1 
Pi > 1 


( 20 ) 


p(pi\H 0 ) = e (- ft+1 ° g(iV )-e-' ,i+IosW ), Pl > 0, (21) 

is a pdf of the Gumbel distribution !\3Q]j . 

Proof: Consider the distribution of pi under the hypothesis Hq for some arbitrary value of 9[ and 
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known noise statistics. Due to the efficiency of maximum likelihood estimators for linear models OTl . 
it is straightforward to show that m ~ CN(/q|0, q). Recall now that pi = \pi\ 2 /q. It is known that the 
square of a normally distributed zero mean random variable normalized by its variance will follow a v 2 
distribution. Since the variance of real and imaginary parts of /// is q/2, then pi will follow a sealed/ 
X 2 distribution with two degrees of freedom. In our case it is an exponential distribution with the pdf 


p{pi) = e p ‘, pi> 0. (22) 

—[All 

This distribution arises when for a fixed 9i different realizations of the residual signal rj are generated. 

—[All —[All 

Alternatively, r\ can be fixed and Gi then drawn at random. Note that under //q the residual r\ is 

a realization of an iV-dimensional Gaussian noise vector. However, due to maximization © we select 
the “best” dispersion parameter 6i out of N independent possibilities]/] As a results an observed value 
of pi under Hq will follow the distribution of a maximum out of N values drawn from (l22l) . Such type 
of distributions are known as extreme value distributions. 

To derive the distribution function F max (pi ) of the corresponding extreme value distribution, we apply 
the Fisher-Tippett-Gnedenko theorem |[29l to the distribution function F{pi) = 1 — e~ pi of the exponential 
pdf (l22l) . By the theorem, F max (p/) can be computed as the limit of appropriately shifted and scaled 
variable pp. F max (pi ) = lim^oo ^F( p ' a b " for some real sequences a n > 0 and b„ > 0 that are 
independent of pi. In our case, it can be demonstrated that for a n = 1 and b n = log(n), the maximum 
of out of N exponentially distributed values will follow a Gumbel distribution li30l F rnax (p;) with the 
distribution function 

F m M = exp (-e-^-^W)) 


and the corresponding pdf 

P(pi\H 0 ) = e (-p i+ log pi > Q. (23) 


Note, however, that for pi < 1 the sparsity parameter a|°°^ = oo. In this case the hypothesis Hq is 
automatically accepted. Taking this into consideration, the pdf p(pi\Ho ) can be specified as 


p(pi\H 0 ) 


F max (l)5(pi) 0<Pi<l 

(1 - F max (l))p(pi\H 0 ) pi > 1, 


(24) 


5 The scaling factor in this case is 1/2 to compensate for the reduced variance of real and imaginary parts. 

6 Note that possible correlations in the residual signal due to diffuse multipath are “whitened” by the matrix A -1 


March 9, 2015 


DRAFT 




14 


which completes the proof. ■ 

The next proposition defines the distribution of pi under hypothesis Hi. 


Proposition 2. Under hypothesis H \ the statistic pi will follow a scaled non-central chi-square distri¬ 
bution 


P(Pl\Hi) 


0 0 < pi < 1 

^p(pi\H 0 ) Pi > 1 


(25) 


where 


PhApi) = e ^ Pl+ )lo (a/2 rmj , 

roo 

and Z= p Hl (pi)dpi. 


(26) 


J l 

Proof: The distribution of pi under hypothesis II\ can be studied in a similai' fashion. The weight 
pi will follow a Gaussian distribution with the true (unknown) mean wi f 0 and a variance q. Following 
the same line of arguments as for the Ho case, it can be shown that pi will follow a scaled non-central 
chi-square distribution x'~A r li) with two degree of freedom and a non-centrality parameter /// = 2|w;/| 2 /q: 


p(pi\Hi) = e ( pl+V l )l 0 (v / 2 wn) , pi> 0, 


(27) 


where Io(x) is a modified Bessel function of the first kind. Since for pi < 1 the II\ hypothesis is 
automatically rejected, the support of p{pi\H\) is restricted to the interval (1, oo). Taking this into account 
leads to result (l25l ). which finalizes the proof. ■ 

Let us note that, strictly speaking, (1251) will hold for components with a sufficiently high “true” SNR 
\wi\ 2 /<q. In high SNR regime optimization ([9]) will consistently result in the same value of 6\. Yet as 
wi decreases, the corresponding residual signal r j '' 1 becomes dominated by the additive noise £ and a 
mixture of (1251 ) and (|23T ) will be observed. 

Now, we can select between Hq and H\ using the following test function T (pi): 



(28) 


E {T(ft)} = e «, 

P{pi\Ho) 


where q is the size of the test. Let us now indicate some important properties of T (pi ). 

1) The test (l28l) is uniformly most powerful (UMP) test of size e:/ to choose between Ho and II \ 
specified by pdfs (l20l) and (l25l) . respectively. This follows from the fact that the rejection region 
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of the test function T(pi), given by the interval [log(l/log(l — q) 1 / N ), oo] is independent of r/i 

m. 

2) Under assumption A1 the standard IARD algorithm implements the test ( l28l ) with m = 1, as seen 
from (fill ). 

3) Since for pi < 1 the corresponding component is automatically removed, the size e; of the test ( 1281) 
must be upper bounded. The upper bound is given by by (1 — F max (l)). 

It is important to stress that for a standard threshold m = 1, the size of the hypothesis test q will be quite 
large for typical values of N (see Fig.Q}. In other words the standard IARD will implement the test ll28l) 


cj 

CD 

N 

CD 

to 

CD 

I- 



N=10000 
N=1000 
N=100 
N=10 


8 10 12 
Threshold ^ 


Fig. 1. Size ti of the test 1281 versus threshold ni for different values of N. 


with a very high probability of false alarm. As a result, Hq will be falsely accepted more often, leading 
to estimation artifacts. Moreover, as the number of samples N increases, the probability of generating 
artifacts grows as well, making it more difficult to distinguish “true” components from noise. The reason 
for this is the optimization ([9]), which leads to the emergence of the extreme value distribution (l20l ). 
As N increases, this distribution shifts further away from the standard threshold k/ = 1, making the 
correct rejection of artifacts less probable. Naturally, by increasing the threshold ki we can control the 
probability of false detection at some desired level £/. 


B. Assumption A2: correlated multipath components 

Under the assumption A2 the pruning condition ( fT8l ) has a similar interpretation. However, due to the 
correlations between the elements of w, the corresponding analysis becomes significantly more involved. 


March 9, 2015 


DRAFT 
































































































16 


Let us begin by considering the marginal posterior of wi for the case when cq = 0. This is again 
a Gaussian pdf with the mean m and the variance q given by m. Consider now the expectation 
E{/q} = K{qsi(6i) h Ar\ A2 ^} in (fl4l) . It can be shown that 


=^(0^ (a- 1 + s(e_,)Aj5(e_ i ) ff ) 

x5(0_ 1 )r«_i + wi. 


(29) 


where we re-used definitions (fl3l) to simplify notation. By inspecting (l29l ) we see that the bias E{/q} does 
not vanish under hypothesis Ho, i.e., when wi = 0. Due to the correlations between the components, this 
bias is proportional to the “true” weights w_i, which are generally unknown. In other words, in order to 
decide between Ho and Hi within the incremental estimation apporach, i.e., for a particular component l, 
we need to known the weights of the other multipath component. This in general prohibits a computation 
of the pdf ph 0 {pi) or PHo(Pi) f° r the case A2 unless some assumptions about the true weights w_i can 
be made. 

Nonetheless, our simulations show that the test t l28l ) applied to the case A2 performs quite well. 


IV. Simulations results 

In the following we will investigate the performance of the proposed joint estimator and component 
detector for synthetic channels. 


A. One component in noise 

We will begin with a single synthetic multipath component in white noise, i.e., L = 1. For that 
we generate a channel response according to © with the following assumptions. We restrict the set 
of dispersion parameters to a single delay r, so that s(0) = s(r). The vector s(r) is constructed as 
s(t) = [s[— t/T s ], ..., s[(N — 1) — r/T s ]] T , where N = 128 and T s = Is. The signal s[n] is an OFDM 
signal with K subcarriers located at discrete frequencies 27r k/K, k = 0..... A' —1. Each subcarrier is 
generated with a constant unit magnitude and random phase uniformly drawn from the interval [0, 27r]. 
The delay r of the synthetic component is set to r = 0. The weight w has a unit magnitude and a random 
phase drawn from the interval [0, 27r]. 

Our goal in this experiment is to validate the derived distributions of the decision statistic pi for both 
Ho and H\ hypothesis. To this end we restrict the values of estimated component delays to the sampling 
instances. The estimation algorithm is then initialized with only 2 components: one with the delay set to 
the true delay r to approximate the H\ hypothesis, and the other one set to the neighboring sampling 
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instance to approximate the hypothesis Hq. To collect the corresponding statistics, we run the algorithm 
and collect the values of pi and p 2 over 10000 independent runs of the algorithm. The obtained empirical 
distributions of both statistics are then compared to the derived theoretical distributions p(pi\H 0 ) and 
p(pi\H\). For both components a pruning threshold of k\ = K 2 = 1 is used, which corresponds to the 
standard IARD pruning condition. The analysis is performed for different input SNRs that we compute 
as 101og 10 ||u>is(ti)|| 2 /||£|| 2 + 10 log 10 (-/V); here, 101og 10 (!V) is the processing gain of the estimator. 

We begin our tests for the assumption Al. For that we use K = N, which corresponds to the correlation 
coefficient of 0.007 between the components with delays located at two neighboring sampling instances. 
In Fig. |2] we plot the resulting distributions for 9dB, 13dB, 17dB, and 21dB SNR. As we see, there 
is a very good fit between the empirical and theoretical distributions under the // () hypothesis. Also, as 
expected, for low SNR the derived pdf p(pi\H\) deviates slightly from the observed empirical distribution. 

Now, let us consider the same scenario, yet for correlated components. To increase the correlation 
between the components we select K = N/ 2, K = N/ 4, K = N/8 , and K = N/ 16, which is 
equivalent to keeping the sampling rate fixed while reducing the bandwidth of the signals. This leads 
to increased correlation between closely spaced components. The correlation coefficients between two 
signals located at two consecutive delays for the above chosen values of K are 0.62, 0.89, 0.97, and 
0.99, respectively. 

In Fig. [3] we show the empirical distributions of the decision statistic for 17dB SNR and the corre¬ 
sponding pdfs p(pi\Hq) and p(pi\H\ ). Note that the latter are computed under the assumption Al. As 
expected, for low correlations the pdfs derived for the assumption Al provide a close approximation for 
the A2 case, both for Hq and Hi hypotheses. As the correlation increases, the pdfs of both hypotheses 
exhibit a second mode at the location of the alternative hypothesis. This is direct consequence of the high 
correlation between the components: depending on the noise realization, a component that is “marked” 
as an Hq hypothesis fits the synthetic signal better then the one “marked” as an H\. Practically, it is, 
however, not important which component is eventually selected, as long as the artifacts are removed 
with an appropriately selected threshold m. Considering the tails of the pdf p(pi\Ho) we can conclude 
that in the A2 case the threshold k/ computed for the Al assumption seem to be a reasonable practical 
approximation. 

Let us now test the performance of the proposed detector with the adjusted threshold For that 
we use the same simulation parameters: we generate a single component with r = 0, iV = 128, and 
T s = 1. As the performance measure we look at the number of estimated components and the empirical 
distribution of the estimated delay values versus SNR for the threshold K[ = 1, i.e., no adjustment, and 
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Fig. 2. Comparison of empirical and derived distributions of the test statistic pi under Ho and Hi hypotheses for a) SNR = 9dB, 
b)SNR = 13dB, c)SNR = 17dB, and d) SNR = 21dB. 


adjusted threshold 


Hi = log(l/log(l - ei) 1/N ) 


(30) 


with ei = 0.001. The latter is selected according to (l28l) . Also, we will consider cases K = N, K = N/2, 
K = N/ 4, and K = N/ 16. The corresponding plots are summarized in Fig. [4] As we see, with k/ = 1 


setting, the algorithm mainly detects noise in low SNR (Fig. |4(e)| - |4(h)[ ) and overestimates the number 
of components in high SNR regime. With the adjusted threshold, the number of detections at low SNR 
is almost zero, yet when a component is detected, it corresponds to the actual multipath component with 
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Fig. 3. Comparison of empirical and derived distributions of the test statistic pi under Ho and Hi hypotheses for SNR= 17dB 
and a )K = N/2, b)K = N/A, c )K = N/8, and d )K = 1V/16. 


high probability. 

B. Superresolution properties of the algorithm 

In the next simulation we investigate the resolution ability of the proposed IARD algorithm for both 
the assumptions A1 (IARD-A1) and the assumption A2 (IARD-A2). Here we will consider the case 
L = 2, with component delays 77 no longer restricted to a sampling grid. Additionally, we will consider 
a Doppler shift vi for each component. This setting will correspond to a time-varying SISO channel model 
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Fig. 4. Estimated number of components versus SNR for (a,e,i) K = N, (b,f,j) K = N/2, (c,g,k) K = N/A, and (d,h,l) 
K = Nj 16. In figures (e)-(h) ni = 1; in figures (i)-(l) k ; = log(l/log(l — ei)~ 1/IN ) with e; = 0.001. 


with stationary parameters. To estimate Doppler frequency we will consider M = 25 consecutive channel 
measurements, so that the model of a single component s(0i) is represented as s ( t /, v{) = vec { X } , where 
X is an R x M matrix and [X] rtTn = s[r — ti/T s \e> 2nUirmTa , r = 0,..., R — 1, m = 0,..., M — 1. The 
signal s[n] is a downsampled version of the actual lOMHz-wide calibration signal used in the aeronautical 
channel measurement campaign f33l . The used sampling period is T s = 4//.s, which results in I! = 128 
samples per single channel measurement. The synthetic delays r = [ti,T 2 ] t of the components are 
generated as follows: t\ is uniformly drawn from the interval [0, T s ] and T 2 = t\ + A • T s , with A being 
a simulation parameter. The Doppler frequency zq of the first component is drawn uniformly from the 
interval [—200, 200]Hz; for the second component we select zq = iq + e v , where e u is a random jitter in 
the interval [—2, 2]Hz. The weights of both components have unit magnitude and uniformly distributed 
phase drawn from the interval [0, 27 t]. For both IARD-A1 and IARD-A2 we will select the threshold 
according to (l30l ). 

For comparison purposes we will also consider a classical SAGE algorithm @ that employs Bayesian 
Information Criterium (BIC) Q to select the model order. Two different implementations of the SAGE 
algorithm with BIC criterion are compared. The first implementation (SAGE-BIC-1) exploits the signal 
detection method based on the eigenstructure of the estimated signal covariance matrix (SJ. This algorithm 
first estimates the correlation matrix of the input signal y using N = R X M data samples; then, the 


March 9, 2015 


DRAFT 




































































































































21 


information-theoretic criterion is applied to the eigenvalues of the correlation matrix following the scheme 
described in 10. This gives an estimate of the number of signals, which is then plugged in the SAGE 
algorithm to estimate signal parameters. The second implementation (SAGE-BIC-2) estimates several 
models with different number of components L using the SAGE algorithm as follows: it starts with the 
model order L = 0 and sequentially increases the model order until the minimum of the BIC criterium 
is achieved, each time fitting the model anew. The BIC criterion is evaluated as 


BIC(L) = - log (p(y|©,u>)| j 


+ -Llog(JV) 


for each possible value of L. Here log (p(y\&, w)\ is the value of the log-likelihood function evaluated 
at maximum, under assumption that the model order is L. The penalty factor |Llog(iV) arises as follows: 
penalization per single complex amplitude is log(iV), and per additional unknown time/frequency shift 
is | log(iV) (see ll34l and @ for more details). Note that in this realisation SAGE-BIC-2 requires fitting 
multiple models to find the minimum of the BIC criterion. It is thus computationally very inefficient for 
realistic channels, where L might range up to several tens of components and number of samples N is 
on the order 10 3 - 10 5 . 

As the performance criteria we compute the averaged number of detected components L, the probability 

(L=2) 

of detecting exactly two components P D , the averaged delay root median squared error (RMeSE) 
RMeSE(r) normalized by the sampling period T s , and Doppler RMeSE RMeSE(9), normalized by the 
Doppler resolution 1/NT S . The latter two quantities are computed only for the cases when a correct 
number of components is detected. Note that at low SNR the component detection rate will also be 
low, which is why the median squared error is used instead of mean squared error. Additionally, we 
evaluate the averaged computation time per single algorithm run. The corresponding plots for SNR 5dB, 
lOdB, 20dB, and 30dB are summarized in Fig. [5] The results are obtained by averaging over 2000 
independent Monte Carlo runs for the IARD-A1, IARD-A2, and SAGE-BIC-1 algorithms. The statistics 
for the SAGE-BIC-2 algorithm are averaged over 300 Monte Carlo runs. 

In terms of the estimated number of components L (Fig. 5(a) 5(d)| ), and probabilities of detection 
1 (Fig. |5(e)H5(h)T >, the IARD-A1, IARD-A2, and SAGE-BIC-2 algorithms perform quite well, 
with the latter offering a slightly better performance. The SAGE-BIC-1 algorithm performs in contrast 
quite poorly: it either underestimates the number of components in low SNR regime, or consistently 
overestimates the model order in the high SNR regime. Its performance also seems to be insensitive to 
the component spacing A. In contrast, the number of correct detection for the other algorithms grows as 
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Fig. 5. Estimation performance of the algorithm in the superresolution regime for (a-e) 5dB, (f-j) lOdB, (k-o) 20dB, and (p-t) 
30dB SNR. Shown are (a,f,k,p) the averaged number of detected components L; (b,g,l,q) the probability Pp =2 of the detecting 
exactly two components; (c,h,m,r) the normalized delay estimation RMSE, (d,i,n,s) the normalized Doppler estimation RMSE, 
and (e,j,o,t) the averaged computational time in seconds per single algorithm run. 
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A and SNR increases. 

In terms of accuracy of parameter estimation (Fig. |5(i)H5(I)| and |5(m)]5(p)[ ) we see that in low SNR 
regime, SAGE-BIC-2 performs slightly better than the other algorithms. In the high SNR regime, SAGE- 
BIC-2 and IARD-A1 perform identically well, with IARD-A2 outperforming them for small component 
spacing A - the advantage of the assumption A2 over a “simpler” assumption Al. For larger spacing A, 
i.e., when the correlation between the components decreases, this advantage, however, disappears, and 
SAGE-BIC-2, IARD-A1 and IARD-A2 deliver similar performance. 

Finally, let us consider the computational time of the algorithms (5(q) - |5(t)| ). It is interesting to note that 
although SAGE-BIC-2 has better component detection capabilities, its computational time is significantly 
higher, since multiple models with different number of components have to be estimated. SAGE-BIC-1 
algorithm is the fastest, since the model order selection is done prior to multipath parameter estimation 
- the most time-consuming part of the algorithm. The IARD-A1 and IARD-A2 algorithms are much 
faster than SAGE-BIC-2, yet they offer a compatible performance both in terms of component detection 
probabilities, as well as in the parameter estimation accuracy. For a higher number of components L the 
inefficiency of the SAGE-BIC-2 algorithm will constitute itself quite significant. 

The difference between the Al and A2 assumptions exhibits itself only for component spacing A 
below approx. 60%, i.e., in a super-resolution regime. In terms of the detection rate, both assumptions 
perform quite similarly. As expected, the parameter estimation accuracy is better for the A2 assumption, 
yet at the expense of slightly higher computational time. 


V. Conclusion 

This work discusses a joint sparse estimation and detection of multipath components within variational 
Bayesian framework. The approach is based on a variational realization of incremental automatic relevance 
determination (IARD) algorithm - a Bayesian sparse signal reconstruction technique. The variational 
Bayesian formulation of the algorithm permits extending the standard IARD algorithm for linear models 
to a problem of parameters estimation of superimposed signals, which requires nonlinear optimizations. 
The sparsity is used to estimate the number of active signals in the model. 

However, for the problem of super-resolution multipath component estimation, where an accurate 
model order selection is of a particular interest, it has been observed that IARD generally overestimates 
the number of components. Here we have demonstrated that this can be explained by the model fitting 
step at which dispersion parameters of propagation paths are estimated. This steps performs a nonlinear 
optimization that adapts the dictionary matrix of the IARD algorithm. As a consequence, the model 
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overfits the measured signal and artifacts are inserted into the model. 

To overcome this we proposed a hypothesis test that exploits statistical structure of the IARD in¬ 
ference expressions. We have shown that due to the optimization of multipath dispersion parameters, 
the corresponding sparsity parameters will follow an extreme value distribution under additive Gaussian 
noise assumption. This interpretation permits a correction of sparsity-driven model order selection within 
IARD using binary hypotheses testing. We have shown that the standard IARD approach is equivalent to a 
hypothesis test with a very high probability of false alarm, which explains model order overestimation. By 
adjusting the IARD pruning conditions to guarantee the desired false alarm probability, the model order 
selection can be improved and correct order can be estimated even in challenging super-resolution regime. 
Simulation studies have demonstrated that this adjustment allows extraction of the true signal sparsity in 
simulated scenarios and further acceleration of the convergence rate of the algorithm as compared to the 
classical information-theoretic model order selection schemes. 
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